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H ' Abstract 

In this paper we consider the computational complexity of the following problems: 
given a DFA or NFA representing a regular language L over a finite alphabet S, is the 
set of all prefixes (resp., suffixes, factors, subwords) of all words of L equal to S*? In 
the case of testing universality for factors of languages represented by DFA's, we find 
an interesting connection to Cerny's conjecture on synchronizing words. 

1 Introduction 

The complexity of deciding universality — i.e., whether a particular formal language over a 
finite alphabet S contains all of S* — is a recurring theme in formal language theory |3j. 
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Frequently it is the case that testing membership for a single word is easy, while testing mem- 
bership for all words simultaneously is hard. For example, in two classic papers, Bar-Hillel, 
Perles, and Shamir proved that testing universality for context-free languages represented 
by grammars is recursively unsolvable [1, Thm. 6.2 (a), p. 160], and Meyer and Stockmeyer 
[9l Lemma 2.3, p. 127] proved that testing universality for regular languages represented by 
nondeterministic finite automata is PSPACE-complete. (Also see [21 [5].) 

Kozen [SI Lemma 3.2.3, p. 261] proved that determining whether the intersection of the 
languages accepted by n DFA's is empty is PSPACE-complete. By complementing each 
DFA, we get 

Lemma 1. The following decision problem is PSPACE-complete: 

Given n DFA's Mi, M2, . . . , M„, each with input alphabet S, is Ui<i<n HMi) = S*? 

Another frequently occurring theme is looking at prefixes, suffixes, factors, and subwords 
of languages. We say a word ?/ is a factor of a word w if there exists words x, z such that 
w = xyz. If in addition x = e, the empty word, then we say ?/ is a prefix of w; if z = e, 
we say y is a suffix. Finally, we say y is a subword of w if we can write y = aia2 ■ ■ - an 
and w = WiaiW2a2 ■ ■ ■ WnCinWn+i for some letters G S and words Wi G S*. (In the 
literature, what we call factors are sometimes called "subwords" and what we call subwords 
are sometimes called "subsequences".) 

Let L C S* be a language. We define 

Pref(L) = {a; G S* : there exists y E L such that a; is a prefix of y}, 

and in a similar manner we define Suff(L), Fact(L), and Subw(L) for suffixes, factors, and 
subwords. 

In this paper we combine these two themes, and examine the computational complexity 
of testing universality for the prefixes, suffixes, factors, and subwords of a regular language. 
As we will see, the complexity depends both on how the language is represented (say, by a 
DFA or NFA), and on the particular type of factor or subword involved. In the case where we 
are testing universality for suffixes of a language represented by a DFA, we find an interesting 
connection with Cerny's celebrated conjecture on synchronizing words. 

Let us briefiy mention some motivation for examining these questions. First, they are 
related to natural questions involving infinite words. By E'^ we mean the set of all right- 
infinite words over E, that is, infinite words of the form aoaia2 ■ ■ ■ , where Oj G S for all 
integers i > 0. Similarly, by '^E we mean the set of all left-infinite words over S, that is, 
infinite words of the form ■ ■ -020100. Finally, by "^S"^ we mean the set of all (unpointed) 
bi-infinite words of the form ■ ■ ■ a_2Ct-icto'2ia2 ■ ■ ■ , where two words are considered the same 
if one is a finite shift of the other. 

Given a language of finite words L C S*, we define L'^ = {X1X2X3 ■ ■ ■ : Xi E L — {e}}, 
the language of right-infinite words generated by L. In a similar way we can define '^L and 

Given a finite set of finite words S, it is a natural question whether all right-infinite words 
(resp., left-infinite words, bi-infinite words) can be generated using only words of S. The 
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following results are not difficult to prove using the usual argument from Konig's infinity 
lemma or a compactness argument: 

Theorem 2. Let S T,* be a finite set of finite words over the finite alphabet S. Then 

(a) 5^ = S"^ zif Pref(5*) = S*. 

(b) -5 = iffSuS{S*) = S*. 

(c) = zjfFact{S*) = S*. 

This theorem, then, leads naturally to the questions on prefixes, suffixes, and factors 
considered in this paper. 

Another motivation is the following: as is well-known, the following decision problem is 
recursively unsolvable [TU] : 

Given a finite set of square matrices of the same dimension, with integer entries, decide 
if some product of them is the all-zeros matrix. 

On the other hand, if the integer matrices are replaced by Boolean matrices, and the 
multiplication is Boolean matrix multiplication, the problem is evidently solvable, as there 
are only finitely many different possibilities to consider. We will show in Corollary [TU] below 
that the decision problem for Boolean matrices is PSPACE-complete. 

2 Basic observations 

We recall some observations from [6]. 

Given a DFA or NFA M = {Q,Ti,6,qo,F), we can easily construct NFA's accepting 
Pref(L(M)), Sufr(L(M)), Fact(L(M)), and Subw(L(M)), as follows: 

To accept Pref (L(M)) with M' = {Q, S, 6, go, F'), we simply change the set of final states 
as follows: a state q is in F' if and only if there is a path from q to a. state of F. Note that 
in this case, if M is a DFA, then so is M'. 

To accept Suff(L(M)), we simply change the set of initial states as follows: a state q is 
initial if and only if there is a path from q^ to q. This construction creates a "generalized" 
NFA which differs from the standard definition of NFA in that it allows an arbitrary set 
of initial states /, instead of just a single initial state. To get around this problem, we 
can simply create a new initial state and e-transitions to all the states of /, and use the 
standard algorithm to get rid of the e-transitions without increasing the number of states 

S- 

To accept Fact(L(M)), we do both of the transformations given above. In fact, there is 
an even simpler way to create a "generalized NFA" accepting Fact (L(M)): starting with M, 
remove all states not reachable from the initial state, and remove all states from which one 
cannot reach a final state. Then make all the remaining states both initial and final. 

To accept Subw(L(M)), we add e-transitions linking every pair of states for which there 
is an ordinary transition. This produces an NFA-e, and again the e-transitions can easily be 
removed without increasing the number of states. 
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3 Universality for DFA's 



In this section we assume that our regular language is represented by a DFA M = {Q, S, 6, Qq, F). 
We assume our DFA is complete, that is, that 5 : Q x S — >• S* is well-defined for all elements 
of its domain. 

3.1 Universality for prefixes 

Of all our results, universality for Pref (L) when L is a DFA is the easiest to decide. By a well- 
known construction, given a DFA M accepting L, we can create a new DFA M' as follows: 
M' = (Q, E, 6, qo, Q—F'), where F' = {q E Q : there exists a path from q to an element of F}. 
Then L[M') = Pref(L(M)). Furthermore, we can determine F' in linear time as follows: 
we reverse all arrows in M, add a new state q' with an arrow to each final state in F, and 
determine which states are reachable from q'. The resulting set of states equals F'. Now 
Pref(L) = S* if and only if L{M') = 0, and this latter condition can easily be tested by 
using depth-first search in M', starting from its initial state. We have proved: 

Theorem 3. Given a DFA M with input alphabet S, we can test i/Pref(L(M)) = S* in 
linear time. 

3.2 Universality for suffixes 

Universality for suffixes is, perhaps surprisingly, much more difficult. 

Theorem 4. The decision problem 

Given a DFA M with input alphabet E, is Suff(L(M)) = S*? 
is PSPACE-complete. 

Proof. Suppose M has n states. To see that the decision problem is in PSPACE, note that 
by the results in section [21 we can convert M to an NFA M' accepting Suff(L(M)), having 
only one more state than M. As we noted above, the universality decision problem for NFA's 
is in PSPACE. 

Now we show that the decision problem is PSPACE-hard. To do so, we reduce from 
the following well-known PSPACE-complete problem: Given n DFA's Mq, Mi, . . . , M„_i, is 
there a word accepted by all of them? More precisely, we reduce from the following problem: 
given n DFA's Mq, Mi, . . . , M„_i, is the union of all their languages equal to S*? 

Suppose Mi = {Qi, T,, 6i, qQ, Fi) for < i < n. Without loss of generality, we assume no 
Mi has transitions into the initial state; if this condition does not hold, we alter Mj to add 
a new initial state and transitions out of this initial state that coincide with the original 
initial state. Let a, c be letters not in S, and let A = S U {a,c}. We create a new DFA 
M = (Q, A, (5, q, F) which is illustrated in Figure [1] below. 
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Figure 1: The reduction for suffixes 



The idea of the construction is as follows: our new machine M incorporates all the 
automata Mq, Mi, . . . , M„_i, but we change all states of each Mi to final. For each Mj, 
we add two new states, rj (nonaccepting) and Si (accepting). Each formerly nonaccepting 
state of Mi has a transition on c to r^, and each accepting state has a transition on c to Si\ 
this is illustrated in Figure [1] with the states labeled "N" (for nonaccepting) and "A" (for 
accepting). Each state of Mj, other than the initial state, has a transition on a back to the 
initial state. 

Each of the Mi is linked via their initial states; is linked to q^^^"* " with a transition 
on a. There are also transitions on each letter in S U {c} from both and Sj to Si. There 
are also transitions on a from both rj and Si to Qq. 

The reader should check that M is actually a complete DFA, and furthermore M accepts 
all words, except possibly some of those that end in a word of the form axe, where x is 
rejected by some Mi. We now prove that Suff (L(M)) = A* iff Uo<.<„L(Mi) = S*. 

Suppose Suff (L(M)) = A*. Then every word in A* is a suffix of some word in L{M). In 
particular, this is true for every word of the form awe, where w E T,*. So yawc is accepted 
by M for some y (depending on w). However, every transition on a leads to a state of the 
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form for some i. Transitions on elements of E then keep us inside the copy of Mj, and 

then processing c leads to either rj or Sj, depending on whether Mj rejects or accepts w, 
respectively. Since yawc is accepted, this means that wc end in Sj, so w is accepted by Mj. 
Since w was arbitrary, this shows that every word is accepted by some Mj. 

On the other other hand, suppose Uo<i<n ^i^i) = We need to show each x e A* is a 
suffix of some word accepted by M. If x contains no a's, then it is accepted by M by a loop 
on the initial state q. Otherwise, we can write x = yaz, where z contains no a's. Then in 
processing x, reading a leads us to some state of the form ^q. If z also contains no c's, then 
processing x in its entirety leads to a state of Mj, all of which have been made accepting in 
our construction. Thus x is accepted. Otherwise, we can write z — vcw, where v contains 
no c's. If w is nonempty, then processing x leads to the state Sj, which is accepting, and so 
X is accepted. Thus we may assume w is empty and z = vc for some w e E*. If reading 
X = yavc leads to Sj for some i, then x is accepted by M. Otherwise, reading x leads to r^. 
By hypothesis v is accepted by some Mj. Let s = {j — i) mod n, and consider a^ccx. Then 
reading a^cc leads to s^j-i) modn- Hence reading a^ccx leads to sj, and it is accepted, and so 
X e Suff(L(M)). 

□ 

3.3 Universality for factors 

Theorem 5. The decision problem 

Given a DFA M with input alphabet E, is Fact(L(M)) = E*? 
is solvable in polynomial-time. 

Proof. Terminology: we say a state q is dead if no accepting state can be reached from q via 
a possibly empty path. If a DFA has a dead state d then every state reachable from it is 
also dead, so there is an equivalent DFA with only one dead state and all transitions from 
that dead state lead to itself. 

We say a state r is universal if no dead state is reachable from it via a possibly empty 
path. We say a state is reachable if there is some path to it from the start state. We say 
a DFA is initially connected if all states are reachable. A DFA M = [Q,'E,6,qQ, F) has a 
synchronizing word w if S{p, w) = 6{q, w) for all states p, q. 

We need two lemmas. 

Lemma 6. If a DFA M has a reachable universal state, then Fact(L(M)) = E*. 

Proof. Let M = {Q, E, 6, qo, F). Let g be a reachable universal state, and let x be such that 
6{qQ,x) = q. Consider any word y, and let S{q,y) = r. Then no dead state is reachable 
from r, for otherwise it would be reachable from q. So there exists a word z such that 
5{y, z) — s, and s is an accepting state. Then d{qo, xyz) — s, so xyz is accepted, and hence 
y e Fact(L(M)). But y was arbitrary, so Fact(L(M)) = E*. □ 

Lemma 7. Suppose the DFA M is initially connected, has no universal states, and has ex- 
actly one dead state. Then there exists x ^ Fact(L(M)) if and only if there is a synchronizing 
word for M. 
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Proof. Suppose M has a synchronizing word x. Then there exists a state q such that for all 
all states p we have 6{p, x) = q. Since, as noted above, all transitions from the unique dead 
state d must go to itself, we must have q = d. Then for all states p we have 5{p, x) = d. So 
X is not in Fact (L(M)), because every path labeled x goes to a state from which one cannot 
reach a final state. 

Now suppose there is a; ^ Fact(L(M)). Then for all y, z we have yxz ^ L{M). In other 
words, no matter what state we start in, xz leads to a nonaccepting state. Then no matter 
what state we start in x leads to a state from which no accepting state can be reached. But 
there is only one such state, the dead state d. So it must be the case that x always leads to 
d, and so x is a synchronizing word. □ 

We can now prove the theorem. The following algorithm decides whether Fact(L(M)) = 
E* in polynomial time: 

1. Remove all states not reachable from the start state by a (possibly empty) directed 
path. 

2. Identify all dead states via depth-first search. If M has at least one dead state, modify 
M to replace all dead states with a single dead state d. 

3. Identify all universal states via depth-first search. If there is a universal state, answer 
"Yes" and halt. 

4. Using the polynomial-time procedure mentioned in Volkov's survey |14|, decide if there 
is a synchronizing word. If there is, answer "No" ; otherwise answer "Yes" . 

To see that it works, we already observed that we can replace all dead states by a single 
dead state without changing the language accepted by M. Furthermore, if a DFA has no 
universal states, then it has at least one dead state (for otherwise every state would be 
universal). So when we reach step 4 of the algorithm, we are guaranteed that M has exactly 
one dead state and we can apply Lemma [71 □ 

3.4 Universality for subwords 

This is covered in section 14.41 below. 

4 Universality for NFA's 

In this section we consider the same problems as before, but now we represent our regular 
language by an NFA. Some of these results essentially appeared in [6], but with different 
proofs and some in weaker form. 
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4.1 Universality for prefixes 

Theorem 8. The decision problem 

Given an NFA M with input alphabet S, is Pref(L(M)) = S*? 
is PSPACE-complete. 

Proof. In fact, this decision problem is even PSPACE-complete when M is restricted to 
be of the form ^4^, where A is a DFA. To see this, note that our construction for suffix 
universality for DFA's given above, when reversed, gives an NFA M with the property that 
Pref (L(M)) = S* if and only if Uo<i<„ ^(M) = S*. □ 

4.2 Universality for suffixes 

Already handled in section 13.21 

4.3 Universality for factors 

Although, as we have seen, universality for Fact (L(M)) is testable in polynomial-time when 
M is a DFA, the same decision problem becomes PSPACE-complete when M is an NFA. To 
see this, we again reduce from the universality problem for n DFA's. Figure [2] illustrates the 
construction. Given the DFA's Mq, Mi, . . . , M„_i, each with input alphabet S, we create 
a new NFA as illustrated. We assume that S does not contain the letters a, c and set 
A := S IJ {a, c}. Restricting our attention to the states q, r, s we get an NFA that accepts 
all words not having a word of the form aS*c as a factor. On the other hand, a word of the 
form awe for G S* is a factor of a word in L{M) iff w G L{Mi) for some i. We now claim 
that Fact(L(M)) = A* iff Uo<i<n ^(M) = S*. 

Suppose Fact(L(M)) = A*. Then in particular every factor of the form awb, with w G S* 
is a factor of a word of M. But the only way such a word can be a factor is by entering one 
of the Mi components on a and exiting on c, and there are only transitions on c on states 
that were originally final in Mj. So w must be accepted by some Mj. Since w was arbitrary, 
wehaveUo<.<n^(M) = S*. 

On the other hand, suppose [Jg<^^^ L(Mj) = S*. We claim every word x in A* is in 
Fact(L(M)). To see this, note that if x contains no subword of the form awe, with c G S*, 
then it is accepted by a path starting from state q and only involving the states q, r, and s. 
Otherwise x contains a subword of form awe. Identify all the positions of c's in x and write 

where each G S U {a}. If an Xj contains no a's, there is a path 
from q to q labeled XjC. Otherwise Xi contains at least one a. Identify the position of the 
last a in Xi, and write Xj = i/iazi, where Zi contains no a's. Then starting in q and reading 
Hi takes us to either state q,r, or s; reading the a takes us to t and then to any q^. Since 
Zi G S*, and since ljQ<-^„L(Mj) = S*, we can choose the particular Mj that accepts Zi. 
Then reading c takes us back to state q. By this argument we see that xe is always accepted 
by M, and hence x is a factor of L[M). 
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Figure 2: The reduction for factors 

Theorem 9. The decision problem 

Given an NFA M with input alphabet E, is Fact(L(M)) = E*? 
is PSPACE-complete. 

As we mentioned in the introduction, this resuh has an interesting interpretation in terms 
of Boolean matrices. Given an NFA M = {Q,'E,5,qo,F), we can form |E| different matrices 
Ma, for each a e E, as follows: has a 1 in row i and column j if qj e 5{qi,a), and a 
otherwise. Then it is easy to see that for all words w — ciC2 • ■ ■ Ck, that Myj :— Mc^Mc^ • • • Mc^^ 
has a 1 in row i and column j iff qj G 5{qi, w). 

Assume that M is an NFA in which every state is reachable from the start state and 
that a final state can be reached from every state. (If M does not fulfill these conditions, 
we can simply delete the appropriate states.) Then form for each a e E. We claim 
that some product of the Ma equals the all-zeros matrix iff Fact(L(M)) 7^ E*. For suppose 
there is some product, say My ior y = ci • • • c^, that equals the all-zeros matrix. Then no 
matter what state we start in, reading y takes us to no state, so xyz is rejected for all x, z. 
Hence y ^ Fact(L(M)). On the other hand, if Fact(L(M)) 7^ E*, then there must be some 
y ^ Fact(L(iH)). We claim My is the all-zeros matrix. If not, there exist i,j such that My 
has a 1 in row i and column j. Then since every state is reachable from the start state, there 
exists X such that S{qo, x) = qi. Since a final state can be reached from every state, there 
exists z such that S{qj, z) G F. Then (5(go, xyz) G F, so M accepts xyz and y G Fact(L(M)), 
contradicting our assumption. 
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We have therefore shown 
Corollary 10. The decision problem 

Given a finite list of square Boolean matrices of the same dimension, is some product equal 
to the all-zeros matrix? 
is PSPACE-complete. 

4.4 Universality for subwords 

We now consider the problem of determining, given an NFA M, whether Subw(L(M)) = S*. 

Lemma 11. Let M = {Q,J],6,qQ, F) be an NFA such that (a) every state is reachable from 
go and (b) a final state is reachable from every state. Then Subw(L(M)) = S* if and only if 
the transition diagram of M has a strongly connected component C such that, for each letter 
a G S, there are two states of C connected by an edge labeled a. 

Proof. Suppose the transition diagram of M has a reachable strongly connected component 
C with the given property. Then to obtain any word t/; as a subword of a word in L{M), use 
a word to enter the strongly connected component C, and then travel successively to states 
of C where there is an arrow out labeled with each successive letter of w. Finally, travel to 
a final state. 

For the converse, assume Subw (L(M)) = S*, but the transition diagram of M has no 
strongly connected component with the given property. Then since any directed graph can be 
decomposed into a directed acyclic graph on its strongly connected components, we can write 
any w G L{M) as XiyiX2y2 ■ ■ ■ Xn, where Xi is the word traversed inside a strongly connected 
component, and yi is the letter on an edge linking two strongly connected components. 
Furthermore, n < N, where is the total number of strongly connected components. If 
S = {fli, 02, ... , Ok}, then Subw(L(M)) omits the word w = (aia2 ■ ■ ■ ak)^~^^, because the 
first component encountered has no transition on some letter Oj, so reading aia2 ■ ■ ■ either 
forces a transition to (at least) the next component of the DAG, or in the case of an NFA, 
ends the computational path with no move. Since there are only strongly connected 
components, we cannot have as a subword of any accepted word. □ 

We can now prove 

Theorem 12. Given an NFA M with input alphabet S, we can determine z/Subw(L(M)) = 
S* in linear time. 

Proof. First, use depth-first search to remove all states not reachable from the start state. 
Next, use depth-first search (on the transition diagram of M with arrows reversed) to remove 
all states from which one cannot reach a final state. Next, determine the strongly connected 
components of the transition diagram of M (which can be done in linear time [13]). Finally, 
examine all the edges of each strongly connected component C to see if for all a G S, there 
is an edge labeled a. □ 
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5 Shortest counterexamples 



We now turn to the following question: given that Pref(L(M)) 7^ S*, what is the length of 
the shortest word in Pref(L(M)), as a function of the number of states of M? We can ask 
the same question for suffixes, factors, and subwords. 

Theorem 13. Let M be a DFA or NFA with n states. Suppose Pref (L(M)) ^ S*. Then 
the shortest word in Pref(L(M)) is 

(a) of length < n — 1 if M is a DFA, and there exist examples achieving n — 1; 

(b) of length <2^ if M is an NFA, and there exist examples achieving 2^^ for some constant 
c. 



Proof, (a) If M is a DFA with n states, our construction shows Pref (L(M)) can be accepted 
by a DFA M' with n states. If M' accepts a string, it accepts one of length < n — 1. 

An example achieving this bound is L = a*^"^, which can be accepted by an n-state 
DFA, and the shortest string not in Pref(L) is a""^. 

(b) The upper bound is trivial (convert the NFA for M to one for Pref (L(M)); then convert 
the NFA to a DFA and change accepting states to non-accepting and vice versa; such 
a DFA has at most 2" states). 

The examples achieving 2'^"' for some constant c can be constructed using an idea in 
[6]: there the authors construct an n-state NFA M with all states final such that the 
shortest string not accepted is of length 2'^". However, if all states are final, then 
Pref(L(M)) = L{M), so this construction provides the needed example. 

□ 

Theorem 14. Let M be a DFA or NFA with n states. Suppose Suff (L(M)) ^ S*. Then the 
shortest word m Suff(L(M)) is of length < 2". There exist DFA 's achieving eV^"i°s"(i+"(i)) 
for a constant c, and there exist NFA 's achieving 2^^" for some constant d. 

Proof. The upper bound of 2" is just like in the proof of Theorem [T3l The example for DFA's 
achieving eV"i°g"(i+°(i)) for some constant c can be constructed by using the construction 
in section [3^ with each Mj a unary DFA accepting IP^~^{IP^)* for primes pi = 2, p2 = 3, 
etc. The construction generates an automaton of 0{pi +p2 + ■ ■ -Pn) states, and the shortest 
word omitted as a suffix is of length > piP2 ■ ■ - Pn- 

For NFA's, we take the construction in the proof of Theorem [13] (b) and construct the 
NFA for the reversed language. This can be done by reversing the order of each transition, 
changing the initial state to final and all final states to initial. This creates a "generalized 
NFA" with a set of initial states, but this can easily be simulated by an ordinary NFA by 
adding a new initial state, adding e-transitions to the former final states, and then removing e- 
transitions using the usual algorithm. This gives an example achieving 2^^" for some constant 
d. □ 
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Theorem 15. Let M be a DFA or NFA with n states. Suppose Fact(L(M)) 7^ S*. Then 
the shortest word in Fact(L(M)) is 

(a) of length 0{n'^) if M is a DFA, and there exist examples achieving VL{v?); 

(b) of length < 2" if M is an NFA, and there exist examples achieving 2'^"' for some constant 
c. 

Proof. (a) The bounds come from known resuhs on synchronizing words [TH [T2] . 

(b) The upper bound is clear. For an example achieving 2^^", we use a construction from 
[6]. There the authors construct a "generalized" NFA M of n states with all states 
both initial and final, such that the shortest string not accepted is of length 2^". 
Such an NFA can be converted to an ordinary NFA, as we have mentioned previously, 
at a cost of increasing the number of states by 1. But for such an NFA, clearly 
Fact(L(M)) = L(M), so the resuh follows. 

□ 

We now turn to subwords. 

Theorem 16. Given a DFA or NFA M ofn states, with input alphabet S, z/Subw(L(M)) 7^ 
S*, then the shortest word in Subw(L(M)) is of length at most n+1, and there exist examples 
achieving n. 

Proof. The upper bound is implied by our proof of Lemma [TTJ An example is provided by 
choosing an alphabet of n symbols, say ao,ai, . . . , a„_i and constructing an NFA M with 
n + 1 states, say qo.qi, . . . , qn, where is accepting and all other states are nonaccepting, 
such that there is a loop on state qi on all symbols except a,, for < i < n. Also, there is 
a transition from q^ to qi^i labeled Oj. Then aoOi ■ ■ ■ a„_iao is not a subword of any word 
accepted by M. □ 



6 Sets of finite words 

As we mentioned in the introduction, one motivation for this work were the problems of 
testing if (a) S'^ = S"^, (b) ^5 = "^S, or (c) ^5'^ = for a finite set of words S. However, 
our results thus far do not really resolve the worst-case complexity of these questions, for two 
reasons. First, as we have seen, answering (a) involves testing if Pref(5'*) = S* (and similarly 
for (b), (c)), which means that to use our results, we must first construct a DFA or NFA 
for S*. While constructing a linear-size NFA for S* is computationally easy, we have no fast 
algorithm for answering our questions in that case (although there clearly are exponential- 
time algorithms). On the other hand, there are examples known where the smallest DFA 
for S* is exponentially large in the size of S (see |7j), so our polynomial-time algorithm for 
prefixes and factors does not give an algorithm running in polynomial time in the size of S. 
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For prefixes and suffixes (cases (a) and (b) above), we can nevertheless obtain an efficient 
algorithm. We state the result for prefixes only; the corresponding result for suffixes can be 
obtained by reversing each word in S. 

Theorem 17. We can test in linear time whether a finite set of finite words S has the 
property that Pref(S'*) = S*. 

Proof. Let k = The following algorithm suffices: construct a trie from the words of S, 
inserting each word successively. If at any point we attempt to insert a word w such that 
some already- inserted word a: is a prefix of w, do not insert w. Similarly, if at any point 
we attempt to insert a word w that is a prefix of an already-inserted word x, remove x and 
insert w instead. Then Pref(S'*) = S* if and only if every node in the trie has degree or 
k. □ 

The problem of the complexity of determining, given a finite set of finite words C S*, 
whether Fact(S'*) = S*, is still open. 

We can also address the question of the shortest word not in Fact(S'*), given that 
Fact(5*) ^ S*. 

Theorem 18. For each n > 1 there exists a set of finite words of length < n, such that the 
shortest word not in Fact(S'*) is of length + n — 1. 

Proof. Let 5" = S" — {0"^^1}. Then it is easy to verify that the shortest word not in Fact(S'*) 
is 0'^-H(0"l)"-^ □ 

7 Afterword 

After this research was completed, we we learned that some of the same questions in our 
paper were recently and independently addressed in an unpublished paper of Pribavkina 
[llj . In particular, she obtained a result similar to our Lemma [TJ and a result more general 
than our Theorem [T8l 
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